EqSpike: Spike-driven equilibrium propagation for neuromorphic implementations
نویسندگان
چکیده
•EqSpike is a spiking neural network version of equilibrium propagation•It achieves 97.6% test accuracy on MNIST with fully connected architecture•Its two-factor local learning rule compatible neuromorphic hardware•Its weight updates exhibit form spike-timing-dependent plasticity Finding spike-based algorithms that can be implemented within the constraints systems, while achieving high accuracy, remains formidable challenge. Equilibrium propagation promising alternative to backpropagation as it only involves computations, but hardware-oriented studies have so far focused rate-based networks. In this work, we develop algorithm called EqSpike, which learns by propagation. Through simulations, obtain recognition handwritten digits dataset (Mixed National Institute Standards and Technology), similar propagation, comparing favorably techniques for We show EqSpike in silicon technology could reduce energy consumption inference training, respectively, three orders two magnitude compared graphics processing units. Finally, also during learning, plasticity, highlighting possible connection biology. Spike-based systems have, recent years, demonstrated outstanding efficiency tasks (Merolla et al., 2014Merolla P.A. Arthur J.V. Alvarez-Icaza R. Cassidy A.S. Sawada J. Akopyan F. Jackson B.L. Imam N. Guo C. Nakamura Y. al.A million spiking-neuron integrated circuit scalable communication interface.Science. 2014; 345: 668-673Crossref PubMed Scopus (1864) Google Scholar). Implementing training deep networks such remains, however, considerable challenge, does not apply directly requires spatially non-local computations go against principles systems. A large number use unsupervised biologically inspired (STDP) because its updates, based relative timing pre- post-synaptic spikes, are achieved compact circuits several technologies (Bi Poo, 2001Bi G. Poo M. Synaptic modification correlated activity: hebb’s postulate revisited.Annu. Rev. Neurosci. 2001; 24: 139-166Crossref (1031) Scholar; Masquelier Thorpe, 2007Masquelier T. Thorpe S.J. Unsupervised visual features through spike dependent plasticity.PLoS Comput. Biol. 2007; 3: e31Crossref (314) Bichler 2012Bichler O. Querlioz D. J., S. Bourgoin J.-P. Gamrat Extraction temporally from dynamic vision sensors plasticity.Neural Networks. 2012; 32: 339-348Crossref (110) Zamarreño-Ramos 2011Zamarreño-Ramos Camuñas-Mesa L.A. Pérez-Carrasco J.A. Serrano-Gotarredona Linares-Barranco B. On spike-timing-dependent-plasticity, memristive devices, building self-learning cortex.Front. 2011; 5: 26Crossref (307) Jo 2010Jo S.H. Chang Ebong I. Bhadviya B.B. Mazumder P. Lu W. Nanoscale memristor device synapse systems.Nano Lett. 2010; 10: 1297-1301Crossref (2616) Pedretti 2017Pedretti Milo V. Ambrogio Carboni Bianchi Calderoni A. Ramaswamy Spinelli Ielmini Memristive on-line tracking brain-inspired plasticity.Sci. Rep. 2017; 7: 5288Crossref (89) Serb 2016Serb Bill Khiat Berdan Legenstein Prodromakis probabilistic multi-state metal-oxide synapses.Nat. Commun. 2016; 7 (ncomms12611)Crossref (185) Prezioso 2018Prezioso Mahmoodi M.R. Merrikh Bayat Nili H. Kim Vincent Strukov D.B. Spike-timing-dependent coincidence detection passively circuits.Nat. 2018; 9: 5311Crossref (69) Thakur 2018Thakur C.S. Lottier Molin Cauwenberghs Indiveri Kumar K. Qiao Schemmel Wang Chicca E. Hasler J.O. Large-scale array processors: quest mimic brain.Front. 12: 991Crossref (1) Feldmann 2019Feldmann Youngblood Wright D., Bhaskaran Pernice P., All-optical neurosynaptic capabilities.Nature. 2019; 569: 208-214Crossref (288) Unfortunately, STDP generally do minimize global objective function network, STDP-trained below state-of-the-art error (Falez 2019Falez Tirilly Bilasco Ioan Marius Devienne Boulet feature plasticity: how traditional approaches?.Pattern Recognit. 93: 418-429Crossref (12) Important research efforts therefore investigate mathematically modified make appropriate (Neftci 2017Neftci E.O. Augustine Paul Detorakis Event-driven random back-propagation: enabling machines.Front. 11: 1621-1671Crossref (111) Sacramento 2018Sacramento Ponte Costa Bengio Senn Dendritic cortical microcircuits approximate algorithm.in: Advances Neural Information Processing Systems. 31. Curran Associates, Inc., 2018: 8721-8732Google Richards 2019Richards B.A. Lillicrap T.P. Beaudoin Bogacz Christensen Clopath de Berker Ganguli framework neuroscience.Nat. 22: 1761-1770Crossref (147) Neftci 2019Neftci Mostafa Zenke Surrogate gradient networks: bringing power gradient-based optimization networks.IEEE Signal Process. Mag. 36: 51-63Crossref (100) Kaiser 2020Kaiser dynamics continuous (DECOLLE).Front. 2020; 14: 424Crossref (27) Bellec 2020Bellec Scherr Subramoney Hajek Salaj Maass solution dilemma recurrent neurons.Nat. 3625Crossref (37) Payeur 2020Payeur Guerguiev Naud Burst-dependent synaptic coordinate hierarchical circuits.bioRxiv. https://doi.org/10.1101/2020.03.30.015511Crossref (0) The derived rules composed factors. first take into account, usual, behavior post-neurons, third allows introduction an additional factor. This factor leads implementations chips less compact, possibly efficient, than (Payvand 2020Payvand Fouda M.E. Kurdahi Eltawil Error-triggered three-factor crossbar arrays.in: 2020 2nd IEEE International Conference Artificial Intelligence Circuits Systems (AICAS). (AICAS), 2020: 218-222Crossref (10) propose different approach using local, complex tasks. Instead starting modifying start (Scellier Bengio, 2017Scellier propagation: bridging gap between energy-based models backpropagation.Front. 24Crossref (96) Scholar) intrinsically space key advantages (Kendall 2020Kendall Pantone Manickavasagam Scellier Training end-to-end analog propagation.arXiv. (arXiv:2006.01981)Google Zoppo 2020Zoppo Marrone Corinto memristor-based networks.Front. 240Crossref (3) theoretically applies any physical system whose derive function. By minimizing data patterns, made relax toward states minimal prediction respect targets match those back-propagation time (BPTT) static inputs (Ernoult 2019Ernoult Grollier Updates prop gradients backprop RNN input.in: Wallach 32. 2019: 7081-7091Google Scholar), reaches image benchmarks CIFAR-10 (Canadian For Advanced Research) (Laborieux 2021Laborieux Ernoult Scaling ConvNets drastically reducing estimator bias.Frontiers Neuroscience. 2021; 15: 129Crossref (2) uses same set weights forward backward pass, plausible, interesting computing decreases devices update, thus overall consumption. Contrarily backpropagation, phases, another highly desirable greatly simplifies circuits. is, originally algorithm. Here, design spiking, hardware-friendly current online (Schemmel 2010Schemmel Brüderle Grübl Hock Meier Millner wafer-scale hardware large-scale modeling.in: Proceedings 2010 Symposium Systems, 2010: 1947-1950Crossref (382) Furber 2014Furber B., Galluppi Temple Plana A., L. SpiNNaker project.Proc. IEEE. 102: 652-665Crossref (550) 2015Qiao Corradi Osswald Stefanini Sumislawska reconfigurable processor comprising 256 neurons 128K synapses.Front. 2015; 141Crossref (313) Davies 2018Davies Srinivasa Lin T.-H. Chinya Cao Choday H., Dimou Joshi Jain al.Loihi: manycore on-chip learning.IEEE Micro. 38: 82-99Crossref (855) Frenkel 2019Frenkel Lefebvre Legat J.-D. Bol 0.086-mm2 12.7-pJ/SOP 64k-synapse 256-neuron online-learning digital 28-nm CMOS.IEEE Trans. Biomed. Syst. 13: 145-158PubMed Ishii 2019Ishii Lewis Okazaki Okazawa Ito Rasch Nomura Shin U. al.On-chip trainable 1.4M 6T2R PCM 1.6K stochastic LIF RBM.in: 2019 Electron Devices Meeting (IEDM). (IEDM), 14.2.1-14.2.4Crossref (8) Park 2020Park Lee Jeon 65-nm classification energy-efficient direct spike-only feedback.IEEE Solid-State Circuits. 55: 108-119Crossref (5) time: contrarily neither nor activations need stored external memories, synapses updated events. simulate architecture digit database. 97.6%, compares backpropagation-derived methods, par up units (GPUs). STDP, yielding insights link convergent Input clamped input, all other neurons, bidirectionally synapses, evolve dynamically functions phases: free phase nudging phase. phase, performing inference, let reach (Figure 1A). Once done, kept clamped, output nudged desired 1B). During at layer converted “force” acting upon propagating rest until second reached. values probing neuron after or 2020Ernoult Propagation Continual Weight Updates.arXiv. (arXiv:2005.04168)Google has been shown numerically BPTT, It recently 1% BPTT convolutional architectures original formulation where dynamical smoothly time. leaky integrate fire described Hopfield-like E(u)=1/2?iui2?1/2?i?j?(ui)?(uj)??ibi?(ui), u membrane potentials ? their activation function, follows: ?Wij?(?i?j)n?(?i?j)f, product ?i?j measured equilibrium, end nudge extended case when continuously Scholar):dWijdt???j?i+??i?j,(Equation 1) Wij connecting i j, ?i ?j rates neurons. compute derivative, encoded rate derivative post-neuron ?? multiplied pre-neuron. reformulation following: each should quantity proportional ??j (first term Equation 1), reciprocally. here simple strategy, electronic hardware, implement rule. illustrated Figure 1C circuit, including elements neuron, well dedicated blocks extract trains real time, order update accordingly. leaky-integrate-and-fire (LIF) frequency input approximates hard sigmoid prescribed Their maximum fmax=1/Trefract, whereTrefract refractory (see supplemental information details parameter values). novelty standard scheme extracting acceleration 1E. integrator leak ?LI (without reset spikes) takes train emitted outputs slowly varying signal train: VLI??/?LI (Navarro 2020Navarro M.A. Salari J.L. Cowan L.M. Penington N.J. Milescu Sodium channels molecular detects action regulates neuronal firing.eLife. e54940Crossref (7) To delay duration ? subtract actual value delayed value: Vdelay=VLI(t)?VLI(t??)???VLI?t??/?LI??. then low-pass filter smoothing variations. simulated average over Nfilt simulation steps: x(t)¯=1Nfilt?i=0Nfilt?1xi(t?idt), dt step. ??LI??¯ coefficient ?r. corresponding ?wij=?r??LI??¯i, corresponds effective lr=?r??LI= 1.5 10?3. integrators, delays, low pass filters efficiently Complementary Metal Oxide Semiconductor (CMOS) (Mead Ismail, 1989Mead Ismail Analog VLSI Implementation Springer US (The Series Engineering Computer Science, 1989Crossref bidirectional CMOS emergent nano-devices memristors (Ishii Markovi? 2020Markovi? Mizrahi Physics computing.Nat. Phys. : 1-12Google Wan 2020Wan Kubendran Burc Eryilmaz Zhang Liao Wu Deiss Gao Raina al.33.1 74 TMACS/W CMOS-RRAM core dataflow in-situ transposable graphical models.in: Solid- State - (ISSCC). 2020. (ISSCC), 498-500Crossref (30) pseudocode given Algorithm 1 details).Algorithm 1EqSpike procedure one imageInputs: image, model (ninputs,nhidden,nout), loss length Tfree, Tnudge, parameters ?LIF,?LI,uth,?,?r,?,Nfilt,Wijfor tuth:Emit (tj)Update ?j(tj,?LI)for t?[Tfree,Tfree+Tnudge]: o:Compute ?eoNudge neuron: uo?uo????eofor k:Update uk(?LIF,Ik)if uk>uth:Emit (tk)Update ?k(tk,?LI)Compute smoothed: ??¯k((?k(tk),?k(tk??)),…,Nfilt)for wij: Update synapsesif j emits spike:wij?wij+?r???LI??¯iif spike:wij?wij+?r???LI??¯jReturn: Trained image:Wij next image/next epoch. Inputs: spike:wij?wij+?r???LI??¯j Return: now evaluate performance Mixed Technology task, hidden Supplementary details). obtained (orange) (blue) accuracies 2 epochs, deviation six runs shadow color. Table results closest our implementation, continual Eq-Prop trained batch size one.Table 1Comparison C-EP, initialization procedureAlgorithmBPTT784-100-10Continual Eq-Prop784-100-10EqSpike 100784-100-10EqSpike 300784-300-10MNISTTest: 97.11% ± 0.23%Train: 99.06% 0.15%Test: 96.97% 0.12%Train: 99.8% 0.04%Test: 96.87% 0.18%Train: 98.59% 0.03%Test: 97.59% 0.1%Train: 98.91% 0.03%Batch = 1, 6 runs. Open table new tab Batch matches closely descent architecture, margin. With 300 EqSpikes 97.59%. Fully without conversion non-spiking typically achieve 96–98% range O’Connor Welling, 2016O’Connor Welling Deep networks.arXiv. (arXiv:1602.08323)Google 2016Lee J.H. Delbruck Pfeiffer 508Crossref (335) Mostafa, 2018Mostafa Supervised temporal coding Networks Learn. 29: 3227-3235PubMed Tavanaei Maida, 2019Tavanaei Maida BP-STDP: approximating plasticity.Neurocomputing. 330: 39-47Crossref (41) comparable latest investigated platforms. chosen benchmark community interested online, due long times. As equivalent baseline potential, like adapt perform good more advantage being technologies. algorithm, applications quantify spikes needed accuracy. Operation fewer desirable, reduces both execution 3 shows (t×fmax). orange line mean result whole set, (as done averaging window Taverage 100 steps (Taverage=50/fmax) considering highest encodes output. method computes accurately expense having wait letting multiple Spiking offer possibility accelerate computation determining class spike. blue averaged images red, vertical dotted indicates
منابع مشابه
Feature Representations for Neuromorphic Audio Spike Streams
Event-driven neuromorphic spiking sensors such as the silicon retina and the silicon cochlea encode the external sensory stimuli as asynchronous streams of spikes across different channels or pixels. Combining state-of-art deep neural networks with the asynchronous outputs of these sensors has produced encouraging results on some datasets but remains challenging. While the lack of effective spi...
متن کاملSpike propagation in driven chain networks with dominant global inhibition.
Spike propagation in chain networks is usually studied in the synfire regime, in which successive groups of neurons are synaptically activated sequentially through the unidirectional excitatory connections. Here we study the dynamics of chain networks with dominant global feedback inhibition that prevents the synfire activity. Neural activity is driven by suprathreshold external inputs. We anal...
متن کاملNeuromorphic implementations of neurobiological learning algorithms for spiking neural networks
The application of biologically inspired methods in design and control has a long tradition in robotics. Unlike previous approaches in this direction, the emerging field of neurorobotics not only mimics biological mechanisms at a relatively high level of abstraction but employs highly realistic simulations of actual biological nervous systems. Even today, carrying out these simulations efficien...
متن کاملNetwork-driven design principles for neuromorphic systems
Synaptic connectivity is typically the most resource-demanding part of neuromorphic systems. Commonly, the architecture of these systems is chosen mainly on technical considerations. As a consequence, the potential for optimization arising from the inherent constraints of connectivity models is left unused. In this article, we develop an alternative, network-driven approach to neuromorphic arch...
متن کاملMPI- and CUDA- implementations of modal finite difference method for P-SV wave propagation modeling
Among different discretization approaches, Finite Difference Method (FDM) is widely used for acoustic and elastic full-wave form modeling. An inevitable deficit of the technique, however, is its sever requirement to computational resources. A promising solution is parallelization, where the problem is broken into several segments, and the calculations are distributed over different processors. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: iScience
سال: 2021
ISSN: ['2589-0042']
DOI: https://doi.org/10.1016/j.isci.2021.102222